Consulta de Guías Docentes



Academic Year/course: 2022/23

626 - Máster Universitario en Biofísica y Biotecnología Cuantitativa / Master in Biophysics and Quantitative Biotechnology

68461 - Big Data in Biology


Syllabus Information

Academic Year:
2022/23
Subject:
68461 - Big Data in Biology
Faculty / School:
100 - Facultad de Ciencias
Degree:
626 - Máster Universitario en Biofísica y Biotecnología Cuantitativa / Master in Biophysics and Quantitative Biotechnology
ECTS:
6.0
Year:
01
Semester:
Second semester
Subject Type:
Optional
Module:
---

1. General information

1.1. Aims of the course

The objectives of the course are the following:

  • Goal 1: Identify the main fields and applications in contemporary Bio-medical research where data volume, variety and generation velocity make data science approaches pivotal.

  • Goal 2: Understand the basics of data production in Next Generation Sequencing technologies (NGS).

  • Goal 3: Study the most important steps of a paradigmatic RNA-sequencing data analysis pipeline, as a key example of data science application in contemporary Biology; from quality control to statistical modeling and biological interpretation.

  • Goal 4: Understand the basics of single-cell sequencing technologies: describe data generation techniques, data features that differ with respect to bulk sequencing data, and describe, and use, analytical approaches to deal with them.

  • Goal 5: Stablish a critical debate around ethical and epistemological aspects related to the role of the bioinformatician / data-scientist in contemporary computational genomics.

These goals are aligned with the following Sustainable Development Goals (SDGs) of the United Nations 2030 Agenda (https://www.un.org/sustainabledevelopment/es/), in such a way that the acquisition of the learning results of this course provides training and skills to contribute to some extent to its achievement:

  • Goal 2: zero hunger.

  • Goal 3: good health and well Being.

  • Goal 5: gender equality.

  • Goal 9: industry, innovation and infrastructure.

  • Goal 14: life below water.

  • Goal 15: life on land.

1.2. Context and importance of this course in the degree

This course is intended to offer a first approach to data science applications in current bio-medical research. To do that, we first provide a general overview of some of the fields where big data science approaches are used more frequently nowadays. After this general contextualization, we will focus on one among such fields where data variety, volume and ever-increasing production velocity and affordability are most remarkable, and bound to deepest epistemological implications. This is the case of the study of -omics data compiled using NGS technologies. Focusing on transcriptomics as one amongst the most popular -omics data modalities, we will discuss how statistical models, applied on the analysis of these big-data-sets, both at bulk, and single-cell resolutions, can be used to shed light about fundamental aspects of cell Biology, evolutionary Biology, or immunology, among others. From a conceptual point of view, the systems and Biological concepts studied in this course are complementary to the contents presented in the course on Systems and Synthetic Biology. The study of the central dogma of cell Biology, -and more specifically the regulation of gene expression, its variation across cell types and conditions- is central to both courses, even though the methodological toolboxes presented in each of them is totally different (Boolean and continuous models to describe time-series evolution in longitudinal designs in systems Biology; versus statistical modelling for the study of cross-sectional data cohorts in this course). From a methodological point of view, the methods presented in the course complement some of the tools presented in the course on Biostatistics and Bio-informatics.

1.3. Recommendations to take this course

The course, just like the entire master, is conceived for an interdisciplinary audience composed indistinctly by students coming from backgrounds in formal/quantitative sciences and bio-medical programs alike. In order to complete this course, it is highly recommended to prioritize developing strong R programming skills during the first semester "Introduction to computational methods in Biology", as well as to choose this course along with the optional course in "Bio-statistics and Bio informatics", where many statistical concepts that are central here are also presented.

2. Learning goals

2.1. Competences

Basic and General

1. Order, analyze critically, and interpret information from different types of sources.

2. Develop the learning skills needed to continue studying autonomously new data, methods and applications.

3. Communicate results clear and unambiguously, using suitable presentation tools and with the limitations imposed by time or space.

4. Learn to manage the resources and time available for solving a problem or developing a project.

5. Use to quantitative data to discriminate complex hypotheses, and translate data analysis results into biologically articulated conclusions.

6. Develop critical judgement with respect to the results of one’s own data analyses.

7. Get acquainted with multidisciplinary research environments, and learn an efficient language suitable for communication within multidisciplinary collaborations in Bio-medicine.

 

Specific

1. Identify the main fields of application of data science in Biology and Medicine.

2. Acquire fluidity in basic computational management of big datasets of biological information

3. Implement complete and reproducible analysis pipelines of bulk and single cell RNA-seq data.

4. Identify the main features of an experimental design in -omics data, and translate them into optimal modeling strategies.

5. Acknowledge the ethical and societal implications of the decision-making process in biological data analysis.

2.2. Learning goals

At the end of the course, the student will know what are the main fields of application of data science in biomedical research nowadays. Furthermore, the student will know the basics concerning data production in NGS technologies, and will be able to design, and implement, a complete pipeline for the analysis of RNA-seq transcriptomic data: from QC, and mapping, to statistical modeling and critical and biological interpretation of the results. The student will recognize the main differences between bulk and single-cell sequencing data, and will be familiar with the ethical, societal, and epistemological implications of the data analysis tasks covered in the course.

2.3. Importance of learning goals

Fueled by technical developments, many fields in current bio-medical research have turned into data-intensive disciplines that require of substantial expertise regarding storage, handling, and, very importantly, analysis and interpretation of big datasets. A paramount example is that of NGS technologies, that nowadays allow retrieving genomic data of many different types for ever-increasing sample sizes at an affordable cost, which has supposed a revolution in the study of genomes, their regulation, variation within and across species and their implications in fields spanning from evolutionary Biology to clinical care. The explosive growth of the usage of NGS data in biomedical research has been followed by an intense demand, both in industry and academic environments of technicians and scientists with a simultaneous know-how in statistics, computation and data science techniques as well as a proper background in the biological concepts needed to be able not only to analyze big-datasets of biological information, but to interpret them properly from a biological point of view. The structure of this course is oriented towards the acquisition of such profile, which is under strong demand in contemporary biomedical research.

3. Assessment (1st and 2nd call)

3.1. Assessment tasks (description of tasks, marking system and assessment criteria)

1: (40% of the final grade). Continuous evaluation of the student's progress during the practical and theoretical sessions, through the correction of the practice reports, as well as through direct interaction in the classroom, rewarding active participation during the lectures and practices.

2: (20% of the final grade). Seminars and/or oral presentations to the teacher. Towards the end of the course, the teacher will provide materials about a practical case, about which the students will prepare a report, which they will later present orally to the teacher. Coherence and completeness of the analyses proposed, and understanding of the subject, as well as clearness of presentation will be assessed and evaluated.

3: (40% of the final grade) Written exam on the topics discussed throughout the course.

 

4. Methodology, learning tasks, syllabus and resources

4.1. Methodological overview

The methodology followed in this course is oriented towards the achievement of the learning objectives through the implementation of a wide range of teaching and learning tasks, including lectures, practical sessions where lectures, or analogous material will be combined with the step-by-step execution of Rmarkdown code scripts by the teacher, and practice sessions in the computer laboratory room. The virtual platform Moodle will be used to distribute lecture notes, as well as to propose practices, and to broadcast relevant news. Students are expected to participate actively in the class throughout the semester. Course material: Notes written by the lecturers will be available on the course Moodle webpage.

4.2. Learning tasks

The course includes the following learning tasks:

  • Theoretical lectures: using slides, R-markdown documents, or analogous materials, (and possibly also videoconferencing tools as required) deal with the explanation of theory and methods, organized according to the syllabus of the course.

  • Practical lectures, where examples of computational implementations of the analysis described in the theory sessions will be presented to the students, using a combination of slides, R markdown, and code scripts.

  • Computer lab sessions, where students will be asked to solve specific problems, and implement analytical pipelines applying what was presented in theoretical and practical lectures.

  • The presentation of short seminars, towards the end of the course, before the teacher, on a practical case previously proposed.

4.3. Syllabus

The course is structured in six blocks:

Block 1: Reviewing the main fields and applications of data science in Bio-medicine & setting up the course.

Block 2: Gene expression modelling in Bulk sequencing data.

Block 3: Complex designs, batch effects and biological interpretation of genomic analyses.

Block 4: Single-cell -omics technologies: data generation, and analytic strategies in complex tissues.

Block 5: Ethics and reproducibility in data analysis in Bio-medicine.

Block 6: Course recapitulation, and seminar presentations.

 

4.4. Course planning and calendar

 

 

The course is taught during 10 weeks in the second semester, indicatively from February 1st to May 19th.

Lectures will be held according to the schedule published on https://ciencias.unizar.es/calendario-y-horarios. Typically, every week will include two sessions of two-hours, the first of which will be predominantly devoted to theoretical and practical lectures or seminars, whilst the second lecture will generally be devoted to practices.

The precise dates and places will be reminded to the students via the virtual platform Moodle, so the students are advised to check their official (unizar) email account.

Evaluations of the practice sessions will take place throughout the course, structured by blocks. Seminars schedules will be agreed with the students throughout the semester. The exam sessions will be established on the dates and places reported in https://ciencias.unizar.es/consultar-examenes

 

4.5. Bibliography and recommended resources

http://psfunizar10.unizar.es/br13/egAsignaturas.php?codigo=68461


Curso Académico: 2022/23

626 - Máster Universitario en Biofísica y Biotecnología Cuantitativa / Master in Biophysics and Quantitative Biotechnology

68461 - Big Data en Biología


Información del Plan Docente

Año académico:
2022/23
Asignatura:
68461 - Big Data en Biología
Centro académico:
100 - Facultad de Ciencias
Titulación:
626 - Máster Universitario en Biofísica y Biotecnología Cuantitativa / Master in Biophysics and Quantitative Biotechnology
Créditos:
6.0
Curso:
01
Periodo de impartición:
Segundo semestre
Clase de asignatura:
Optativa
Materia:
---

1. Información Básica

1.1. Objetivos de la asignatura

The objectives of the course are the following:
Goal 1: Identify the main fields and applications in contemporary Bio-medical research where
data volume, variety and generation velocity make data science approaches pivotal.
Goal 2: Understand the basics of data production in Next Generation Sequencing technologies
(NGS).
Goal 3: Study the most important steps of a paradigmatic RNA-sequencing data analysis
pipeline, as a key example of data science application in contemporary Biology; from quality
control to statistical modeling and biological interpretation.
Goal 4: Understand the basics of single-cell sequencing technologies: describe data generation
techniques, data features that differ with respect to bulk sequencing data, and describe, and
use, analytical approaches to deal with them.
Goal 5: Stablish a critical debate around ethical and epistemological aspects related to the role
of the bioinformatician / data-scientist in contemporary computational genomics.
These goals are aligned with the following Sustainable Development Goals (SDGs) of the United
Nations 2030 Agenda (https://www.un.org/sustainabledevelopment/es/), in such a way that the
acquisition of the learning results of this course provides training and skills to contribute to some
extent to its achievement:
Goal 2: zero hunger.
Goal 3: good health and well Being.
Goal 5: gender equality.
Goal 9: industry, innovation and infrastructure.
Goal 14: life below water.
Goal 15: life on land.

1.2. Contexto y sentido de la asignatura en la titulación

This course is intended to offer a first approach to data science applications in current bio-medical
research. To do that, we first provide a general overview of some of the fields where big data science
approaches are used more frequently nowadays. After this general contextualization, we will focus on
one among such fields where data variety, volume and ever-increasing production velocity and
affordability are most remarkable, and bound to deepest epistemological implications. This is the case
of the study of -omics data compiled using NGS technologies. Focusing on transcriptomics as one
amongst the most popular -omics data modalities, we will discuss how statistical models, applied on
the analysis of these big-data-sets, both at bulk, and single-cell resolutions, can be used to shed light about fundamental aspects of cell Biology, evolutionary Biology, or immunology, among others.

From a conceptual point of view, the systems and Biological concepts studied in this course are
complementary to the contents presented in the course on Systems and Synthetic Biology. The study
of the central dogma of cell Biology, -and more specifically the regulation of gene expression, its
variation across cell types and conditions- is central to both courses, even though the methodological
toolboxes presented in each of them is totally different (Boolean and continuous models to describe
time-series evolution in longitudinal designs in systems Biology; versus statistical modelling for the
study of cross-sectional data cohorts in this course). From a methodological point of view, the
methods presented in the course complement some of the tools presented in the course on
Biostatistics and Bio-informatics.

1.3. Recomendaciones para cursar la asignatura

The course, just like the entire master, is conceived for an interdisciplinary audience composed
indistinctly by students coming from backgrounds in formal/quantitative sciences and bio-medical
programs alike. In order to complete this course, it is highly recommended to prioritize developing
strong R programming skills during the first semester "Introduction to computational methods in
Biology", as well as to choose this course along with the optional course in "Bio-statistics and Bio
informatics", where many statistical concepts that are central here are also presented.

2. Competencias y resultados de aprendizaje

2.1. Competencias

Basic and General
1. Order, analyze critically, and interpret information from different types of sources.
2. Develop the learning skills needed to continue studying autonomously new data, methods and
applications.
3. Communicate results clear and unambiguously, using suitable presentation tools and with the
limitations imposed by time or space.
4. Learn to manage the resources and time available for solving a problem or developing a project.
5. Use to quantitative data to discriminate complex hypotheses, and translate data analysis results
into biologically articulated conclusions.
6. Develop critical judgement with respect to the results of one?s own data analyses.
7. Get acquainted with multidisciplinary research environments, and learn an efficient language
suitable for communication within multidisciplinary collaborations in Bio-medicine.

Specific
1. Identify the main fields of application of data science in Biology and Medicine.
2. Acquire fluidity in basic computational management of big datasets of biological information
3. Implement complete and reproducible analysis pipelines of bulk and single cell RNA-seq data.
4. Identify the main features of an experimental design in -omics data, and translate them into
optimal modeling strategies.
5. Acknowledge the ethical and societal implications of the decision-making process in biological data
analysis.

2.2. Resultados de aprendizaje

At the end of the course, the student will know what are the main fields of application of data science
in biomedical research nowadays. Furthermore, the student will know the basics concerning data
production in NGS technologies, and will be able to design, and implement, a complete pipeline for
the analysis of RNA-seq transcriptomic data: from QC, and mapping, to statistical modeling and
critical and biological interpretation of the results. The student will recognize the main differences
between bulk and single-cell sequencing data, and will be familiar with the ethical, societal, and
epistemological implications of the data analysis tasks covered in the course.

2.3. Importancia de los resultados de aprendizaje

Fueled by technical developments, many fields in current bio-medical research have turned into
data-intensive disciplines that require of substantial expertise regarding storage, handling, and, very importantly, analysis and interpretation of big datasets. A paramount example is that of NGS
technologies, that nowadays allow retrieving genomic data of many different types for
ever-increasing sample sizes at an affordable cost, which has supposed a revolution in the study of
genomes, their regulation, variation within and across species and their implications in fields spanning
from evolutionary Biology to clinical care. The explosive growth of the usage of NGS data in
biomedical research has been followed by an intense demand, both in industry and academic
environments of technicians and scientists with a simultaneous know-how in statistics, computation
and data science techniques as well as a proper background in the biological concepts needed to be
able not only to analyze big-datasets of biological information, but to interpret them properly from a
biological point of view. The structure of this course is oriented towards the acquisition of such
profile, which is under strong demand in contemporary biomedical research.

3. Evaluación

3.1. Tipo de pruebas y su valor sobre la nota final y criterios de evaluación para cada prueba

1: (40% of the final grade). Continuous evaluation of the student's progress during the practical and
theoretical sessions, through the correction of the practice reports, as well as through direct
interaction in the classroom, rewarding active participation during the lectures and practices.
2: (20% of the final grade). Seminars and/or oral presentations to the teacher. Towards the end of the
course, the teacher will provide materials about a practical case, about which the students will
prepare a report, which they will later present orally to the teacher. Coherence and completeness of
the analyses proposed, and understanding of the subject, as well as clearness of presentation will be
assessed and evaluated.
3: (40% of the final grade) Written exam on the topics discussed throughout the course.

4. Metodología, actividades de aprendizaje, programa y recursos

4.1. Presentación metodológica general

The methodology followed in this course is oriented towards the achievement of the learning
objectives through the implementation of a wide range of teaching and learning tasks, including
lectures, practical sessions where lectures, or analogous material will be combined with the
step-by-step execution of Rmarkdown code scripts by the teacher, and practice sessions in the
computer laboratory room. The virtual platform Moodle will be used to distribute lecture notes, as
well as to propose practices, and to broadcast relevant news. Students are expected to participate
actively in the class throughout the semester. Course material: Notes written by the lecturers will be
available on the course Moodle webpage.

4.2. Actividades de aprendizaje

The course includes the following learning tasks:

  • Theoretical lectures: using slides, R-markdown documents, or analogous materials, (and
    possibly also videoconferencing tools as required) deal with the explanation of theory and
    methods, organized according to the syllabus of the course.
  • Practical lectures, where examples of computational implementations of the analysis
    described in the theory sessions will be presented to the students, using a combination of
    slides, R markdown, and code scripts.
  • Computer lab sessions, where students will be asked to solve specific problems, and
    implement analytical pipelines applying what was presented in theoretical and practical
    lectures.
  • The presentation of short seminars, towards the end of the course, before the teacher, on a
    practical case previously proposed.

4.3. Programa

The course is structured in six blocks:

Block 1: Reviewing the main fields and applications of data science in Bio-medicine & setting up the
course.
Block 2: Gene expression modelling in Bulk sequencing data.
Block 3: Complex designs, batch effects and biological interpretation of genomic analyses.
Block 4: Single-cell -omics technologies: data generation, and analytic strategies in complex tissues.
Block 5: Ethics and reproducibility in data analysis in Bio-medicine.
Block 6: Course recapitulation, and seminar presentations.

4.4. Planificación de las actividades de aprendizaje y calendario de fechas clave

The course is taught during 10 weeks in the second semester, indicatively from February 1 to May 19 .
st th

Lectures will be held according to the schedule published on https://ciencias.unizar.es/calendario-y-horarios. Typically, every
week will include two sessions of two-hours, the first of which will be predominantly devoted to theoretical and practical
lectures or seminars, whilst the second lecture will generally be devoted to practices.
The precise dates and places will be reminded to the students via the virtual platform Moodle, so the students are advised to
check their official (unizar) email account.

Evaluations of the practice sessions will take place throughout the course, structured by blocks. Seminars schedules will be
agreed with the students throughout the semester. The exam sessions will be established on the dates and places reported
in https://ciencias.unizar.es/consultar-examenes

4.5. Bibliografía y recursos recomendados

http://psfunizar10.unizar.es/br13/egAsignaturas.php?codigo=68461